DRAFT GP-Gammon: Genetically Programming Backgammon Players
نویسندگان
چکیده
We apply genetic programming to the evolution of strategies for playing the game of backgammon. We explore two different strategies of learning: using a fixed external opponent as teacher, and letting the individuals play against each other. We conclude that the second approach is better and leads to excellent results: Pitted in a 1000-game tournament against a standard benchmark player—Pubeval— our best evolved program wins 62.4% of the games, the highest result to date. Moreover, several other evolved programs attain win percentages not far behind the champion, evidencing the repeatability of our approach.
منابع مشابه
Using GP-Gammon: Using Genetic Programming to Evolve Backgammon Players
We apply genetic programming to the evolution of strategies for playing the game of backgammon. Pitted in a 1000-game tournament against a standard benchmark player—Pubeval—our best evolved program wins 58% of the games, the highest verifiable result to date. Moreover, several other evolved programs attain win percentages not far behind the champion, evidencing the repeatability of our approach.
متن کاملProgramming backgammon using self-teaching neural nets
TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results. Starting from random initial play, TD-Gammon’s selfteaching methodology results in a surprisingly strong program: without lookahead, its positional judgement rivals that of human experts, and when combined with shallow lookahead, it reaches a level of pla...
متن کاملTD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play
TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results, based on the TD(X) reinforcement learning algorithm (Sutton 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammon achieves a surprisingly strong level of play. With zero knowledge built in at the start of learn...
متن کاملM2ICAL Analyses HC-Gammon
We analyse Pollack and Blair’s HC-Gammon backgammon program using a new technique that performs Monte Carlo simulations to derive a Markov Chain model for Imperfect Comparison ALgorithms, called the MICAL method, which models the behavior of the algorithm using a Markov chain, each of whose states represents a class of players of similar strength. The Markov chain transition matrix is populated...
متن کاملFeature Construction for Reinforcement Learning in Hearts
Temporal difference (TD) learning has been used to learn strong evaluation functions in a variety of two-player games. TD-gammon illustrated how the combination of game tree search and learning methods can achieve grand-master level play in backgammon. In this work, we develop a player for the game of hearts, a 4-player game, based on stochastic linear regression and TD learning. Using a small ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005